This report will summarize what we have done for this group assignment. There are 2 separate chapters that we use to group our work :
Exploratory Analysis.
Statistical Analysis based on Rodent.
Identify potential variables of interest:
## record_id month day year
## Min. : 1 Min. : 1.000 Min. : 1.0 Min. :1977
## 1st Qu.: 8964 1st Qu.: 4.000 1st Qu.: 9.0 1st Qu.:1984
## Median :17762 Median : 6.000 Median :16.0 Median :1990
## Mean :17804 Mean : 6.474 Mean :16.1 Mean :1990
## 3rd Qu.:26655 3rd Qu.:10.000 3rd Qu.:23.0 3rd Qu.:1997
## Max. :35548 Max. :12.000 Max. :31.0 Max. :2002
##
## plot_id species_id sex hindfoot_length
## Min. : 1.00 Length:34786 Length:34786 Min. : 2.00
## 1st Qu.: 5.00 Class :character Class :character 1st Qu.:21.00
## Median :11.00 Mode :character Mode :character Median :32.00
## Mean :11.34 Mean :29.29
## 3rd Qu.:17.00 3rd Qu.:36.00
## Max. :24.00 Max. :70.00
## NA's :3348
## weight genus species taxa
## Min. : 4.00 Length:34786 Length:34786 Length:34786
## 1st Qu.: 20.00 Class :character Class :character Class :character
## Median : 37.00 Mode :character Mode :character Mode :character
## Mean : 42.67
## 3rd Qu.: 48.00
## Max. :280.00
## NA's :2503
## plot_type
## Length:34786
## Class :character
## Mode :character
##
##
##
##
Figure 1.3 | Taxa observations spanning 26 years (1977 to 2002). Rodent shows consistent observations every year, 1977 being the peak of observation. Bird and rabbit have the similar pattern, but missing information for a couple of years. Reptile was only observed for eight intermittent years.
Figure 1.4 | Species average weight distribution from the year 1977 to 2002. Only rodent has the complete information on average weight, hence the species IDs herein refer to the species from rodent taxon. Most species have low average weight, i.e. below 50g and not every species has consistent distribution from the year 1977 to 2002. NL has the maximum of average weight among others and shows consistent distribution throughout the years.
Based on preliminary investigations, we determined that Rodent was the only taxon for which hindfoot length or sex were recorded. Because of this, any graphs including information about hindfoot length or sex will be strictly for rodent species in the study. The number of rodents in the study is below.
Figure 2.1 | The pie chart shows the distribution of 29 rodent species. Dipodomys merriami is the most abundant species, contributing 30.90% of the total. The second abundant species is Chaetodipus penicillatus, 9.12%, followed by Dipodomys ordii, 8.84%. The three least abundant species are Chaetodipus sp., Reithrodontomys sp. and Spermophilus tereticaudus. Figure 2.2 | The bar plot shows gender distribution of rodents. There are 17348 of male and 15690 of female. Male has a higher count of 1658 than female.
## species_id hfl_mean hfl_sd
## 1 BA 13.00000 1.718879
## 2 DM 35.98283 1.464788
## 3 DO 35.60714 1.665163
## 4 DS 49.94880 2.084383
## 5 NL 32.25746 1.791916
## 6 OL 20.53377 1.434295
Figure 2.3 | The bar plot shows average hindfoot length by rodent species ID. DS has the longest average hindfoot length, i.e. (50)cm, whereas BA has the shortest, i.e. (13)cm. Most of the species are within the range of 18-22cm.
This plot suggests hindfoot length (cm) is different according to species ID. We conducted several tests to determine if this was the case. We began by conducting a Bartlett Test of Homogeneity of Variances to evaluate the homogeneity of variance assumption of ANOVA.
##
## Bartlett test of homogeneity of variances
##
## data: hindfoot_length by species_id
## Bartlett's K-squared = 2081.4, df = 22, p-value < 2.2e-16
Because the p-value (0) is less than the alpha of 0.05, we reject the null hypothesis that variances of the levels of species ID are equal. Because of this, we know our data does not meet the assumptions of the ANOVA and a non-parametric alternative must be used. In this case, we used a Kruskal-Wallis rank sum test.
##
## Kruskal-Wallis rank sum test
##
## data: hindfoot_length by as.factor(species_id)
## Kruskal-Wallis chi-squared = 28681, df = 22, p-value < 2.2e-16
Because the p-value (0) is less than the alpha of 0.05, we reject the null hypothesis that all species have the same average hindfoot lengths. However, this test did not inform us as to which species are different from which others. In order to determine this, we conducted a non-parametric Pairwise Wilcoxon rank sum test. In the output below, pairs with values <0.05 are significantly different from each other and we reject the null that the group means are the same.
##
## Pairwise comparisons using Wilcoxon rank sum test
##
## data: surveys_hfoot_id$hindfoot_length and as.factor(surveys_hfoot_id$species_id)
##
## BA DM DO DS NL OL OT OX PB
## DM < 2e-16 - - - - - - - -
## DO < 2e-16 < 2e-16 - - - - - - -
## DS < 2e-16 < 2e-16 < 2e-16 - - - - - -
## NL < 2e-16 < 2e-16 < 2e-16 < 2e-16 - - - - -
## OL < 2e-16 < 2e-16 < 2e-16 < 2e-16 < 2e-16 - - - -
## OT < 2e-16 < 2e-16 < 2e-16 < 2e-16 < 2e-16 5.5e-09 - - -
## OX 0.00474 3.9e-05 4.3e-05 6.1e-05 5.2e-05 1.00000 1.00000 - -
## PB < 2e-16 < 2e-16 < 2e-16 < 2e-16 < 2e-16 < 2e-16 < 2e-16 1.9e-05 -
## PE < 2e-16 < 2e-16 < 2e-16 < 2e-16 < 2e-16 1.4e-08 1.00000 1.00000 < 2e-16
## PF < 2e-16 < 2e-16 < 2e-16 < 2e-16 < 2e-16 < 2e-16 < 2e-16 0.00588 < 2e-16
## PH 9.3e-12 < 2e-16 < 2e-16 < 2e-16 < 2e-16 < 2e-16 < 2e-16 0.00085 1.00000
## PI 0.00028 3.9e-05 4.3e-05 6.1e-05 6.0e-05 0.00041 6.2e-05 0.03353 2.3e-05
## PL 2.6e-11 < 2e-16 < 2e-16 < 2e-16 < 2e-16 1.00000 1.00000 1.00000 < 2e-16
## PM < 2e-16 < 2e-16 < 2e-16 < 2e-16 < 2e-16 0.35600 0.01793 1.00000 < 2e-16
## PP < 2e-16 < 2e-16 < 2e-16 < 2e-16 < 2e-16 < 2e-16 < 2e-16 0.00289 < 2e-16
## RF < 2e-16 < 2e-16 < 2e-16 < 2e-16 < 2e-16 < 2e-16 < 2e-16 0.02961 < 2e-16
## RM < 2e-16 < 2e-16 < 2e-16 < 2e-16 < 2e-16 < 2e-16 < 2e-16 0.00531 < 2e-16
## RO 0.00474 3.9e-05 4.3e-05 6.1e-05 5.1e-05 4.0e-05 2.4e-05 0.47563 1.9e-05
## RX 0.48931 0.46181 0.46746 0.48117 0.47563 1.00000 1.00000 1.00000 0.38031
## SF 2.2e-13 < 2e-16 < 2e-16 < 2e-16 < 2e-16 < 2e-16 < 2e-16 0.00085 0.74977
## SH < 2e-16 < 2e-16 < 2e-16 < 2e-16 < 2e-16 < 2e-16 < 2e-16 0.00016 < 2e-16
## SO 1.1e-13 < 2e-16 < 2e-16 < 2e-16 < 2e-16 < 2e-16 < 2e-16 0.00362 1.00000
## PE PF PH PI PL PM PP RF RM
## DM - - - - - - - - -
## DO - - - - - - - - -
## DS - - - - - - - - -
## NL - - - - - - - - -
## OL - - - - - - - - -
## OT - - - - - - - - -
## OX - - - - - - - - -
## PB - - - - - - - - -
## PE - - - - - - - - -
## PF < 2e-16 - - - - - - - -
## PH < 2e-16 < 2e-16 - - - - - - -
## PI 7.7e-05 1.9e-05 0.00336 - - - - - -
## PL 1.00000 < 2e-16 2.1e-10 0.00079 - - - - -
## PM 0.00718 < 2e-16 < 2e-16 0.00011 1.00000 - - - -
## PP < 2e-16 < 2e-16 < 2e-16 0.74977 9.6e-12 < 2e-16 - - -
## RF < 2e-16 < 2e-16 1.5e-14 6.9e-05 6.3e-13 < 2e-16 < 2e-16 - -
## RM < 2e-16 < 2e-16 < 2e-16 1.8e-05 < 2e-16 < 2e-16 < 2e-16 < 2e-16 -
## RO 3.6e-05 1.00000 0.00085 0.03353 0.00132 2.6e-05 3.2e-05 0.00207 0.35077
## RX 1.00000 0.51958 0.56697 0.95423 1.00000 1.00000 0.56517 1.00000 1.00000
## SF < 2e-16 < 2e-16 1.00000 0.01315 4.4e-11 < 2e-16 < 2e-16 < 2e-16 < 2e-16
## SH < 2e-16 < 2e-16 3.1e-07 0.00046 < 2e-16 < 2e-16 < 2e-16 < 2e-16 < 2e-16
## SO < 2e-16 < 2e-16 1.00000 0.04950 3.4e-09 < 2e-16 < 2e-16 2.8e-15 < 2e-16
## RO RX SF SH
## DM - - - -
## DO - - - -
## DS - - - -
## NL - - - -
## OL - - - -
## OT - - - -
## OX - - - -
## PB - - - -
## PE - - - -
## PF - - - -
## PH - - - -
## PI - - - -
## PL - - - -
## PM - - - -
## PP - - - -
## RF - - - -
## RM - - - -
## RO - - - -
## RX 1.00000 - - -
## SF 0.00060 0.56697 - -
## SH 0.00013 0.51958 0.00101 -
## SO 0.00091 0.78274 1.00000 1.7e-06
##
## P value adjustment method: holm
Figure 2.4 | The bar plot shows average hindfoot length by plot type. Control and Spectab exclosure contain similar average rodent hindfoot length information.
This plot suggests hindfoot length (cm) is different according to plot type. We used the same techniques as above to determine if this was the case. We began by conducting a Bartlett Test of Homogeneity of Variances to evaluate the homogeneity of variance assumption of ANOVA.
##
## Bartlett test of homogeneity of variances
##
## data: hindfoot_length by plot_type
## Bartlett's K-squared = 1069.6, df = 4, p-value < 2.2e-16
Because the p-value (2.930892210^{-230}) is less than the alpha of 0.05, we reject the null hypothesis that variances of the levels of species ID are equal. Because of this, we know our data does not meet the assumptions of the ANOVA and a non-parametric alternative must be used. As we did above, we used a Kruskal-Wallis rank sum test.
##
## Kruskal-Wallis rank sum test
##
## data: hindfoot_length by as.factor(plot_type)
## Kruskal-Wallis chi-squared = 5808.5, df = 4, p-value < 2.2e-16
Because the p-value (0) is less than the alpha of 0.05, we reject the null hypothesis that all plot types support rodents with the same hindfoot lengths. However, this test did not inform us as to which plot types are different from which others. In order to determine this, we conducted a non-parametric Pairwise Wilcoxon rank sum test. In the output below, all pairs are significantly different from each other because all of the values are <0.05 and we reject the null that the group means are the same.
##
## Pairwise comparisons using Wilcoxon rank sum test
##
## data: surveys_hfoot_plottype$hindfoot_length and as.factor(surveys_hfoot_plottype$plot_type)
##
## Control Long-term Krat Exclosure
## Long-term Krat Exclosure <2e-16 -
## Rodent Exclosure <2e-16 1e-05
## Short-term Krat Exclosure <2e-16 <2e-16
## Spectab exclosure <2e-16 <2e-16
## Rodent Exclosure Short-term Krat Exclosure
## Long-term Krat Exclosure - -
## Rodent Exclosure - -
## Short-term Krat Exclosure <2e-16 -
## Spectab exclosure <2e-16 <2e-16
##
## P value adjustment method: holm
Figure 2.5 | The stacked bar plot on the left panel showed the total of rodent species captured per year. DM has the highest count from the year 1977 to 1999. The amount of PM overtook starting the year 2000 to 2002. The right panel showed gender distribution in terms of species IDs. Overall, most species have a even female and male distribution except OX, PI and RX only have male. PX has one female and male, however, the number is insignificant to be shown on the plot.
Figure 2.6 | Weight density distribution by gender for five different plot types. Most plot type have the density skewed to left, showing the weight distribution of both gender is between 0-100g. Spectab exclosure plot type, however, has male outweighed female. From Control, Rodent Exclosure and Short-term Krat Exclosure plot types, although female has lower density but they are more heavier than the male.
Figure 2.7 | Hindfoot length density distribution by gender for five different plot types. Similar pattern of density distribution for Control and Rodent Exclosure was observed. Female and male are overlapped in both plots. Spectab exclosure has the male more densely distributed for hindfoot length within the range of 30 - 40 cm.
Figure 2.8 | Species density distribution by year for five different plot types. Most plot types have species concentrated within the range of year 1985 and 1995. The density of species distribution of Spectab exclosure tends to skewed towards right.
Figure 2.9 | Correlation of hindfoot length and weight by species IDs.
This suggests there is a relationship between hindfoot length (cm) and weight (g) in rodents. We created a linear model to determine if there is indeed a relationship between these variables.
##
## Call:
## lm(formula = hindfoot_length ~ weight, data = rodent_complete)
##
## Residuals:
## Min 1Q Median 3Q Max
## -40.400 -5.584 -0.509 6.125 36.028
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 21.572622 0.061245 352.2 <2e-16 ***
## weight 0.182831 0.001115 164.0 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 6.964 on 30674 degrees of freedom
## Multiple R-squared: 0.4673, Adjusted R-squared: 0.4673
## F-statistic: 2.69e+04 on 1 and 30674 DF, p-value: < 2.2e-16
Because the p-value is less than the alpha of 0.05, we reject the null hypothesis that the slope of the linear regression model does not differ significantly from zero. In addition, the multiple R-squared value is used to describe how well a given model explains variation in the data. In this case, this model explains 46.73% of the variation in the data.
Figure 2.10 | Rodent weight distribution (g) for female and male according to species IDs.
This suggests there is a relationship between sex and weight (g) in rodents. We conducted a Student’s t-test to compare the weights of males to that of females.
##
## Welch Two Sample t-test
##
## data: Rodent_Female$weight and Rodent_Male$weight
## t = -2.0226, df = 31751, p-value = 0.04312
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -1.62412287 -0.02552529
## sample estimates:
## mean of x mean of y
## 42.17055 42.99538
Because the p-value (0.0431195) is less than the alpha of 0.05, we reject the null hypothesis that the mean weights of the two sexes are the same. In the output above, “mean of x” is the mean of females and “mean of y” is the mean of males.